545 research outputs found

    The Influence Function of Penalized Regression Estimators

    Full text link
    To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore it can be used to compute the asymptotic variance and the mean squared error. In this paper we compute the influence function, the asymptotic variance and the mean squared error for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations nonstandard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function.Comment: appears in Statistics: A Journal of Theoretical and Applied Statistics, 201

    The shooting S-estimator for robust regression

    Full text link
    To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property

    robustHD: An R package for robust regression with high-dimensional data

    Get PDF

    An Object-Oriented Framework for Statistical Simulation: The R Package simFrame

    Get PDF
    Simulation studies are widely used by statisticians to gain insight into the quality of developed methods. Usually some guidelines regarding, e.g., simulation designs, contamination, missing data models or evaluation criteria are necessary in order to draw meaningful conclusions. The R package simFrame is an object-oriented framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with a minimal effort of programming. Its object-oriented implementation provides clear interfaces for extensions by the user. Since statistical simulation is an embarrassingly parallel process, the framework supports parallel computing to increase computational performance. Furthermore, an appropriate plot method is selected automatically depending on the structure of the simulation results. In this paper, the implementation of simFrame is discussed in great detail and the functionality of the framework is demonstrated in examples for different simulation designs.

    Sparse least trimmed squares regression.

    Get PDF
    Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L1 penalty on the coefficient estimates to the well known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. Both the simulation study and the real data example show that the LTS has better prediction performance than its competitors in the presence of leverage points.Breakdown point; Outliers; Penalized regression; Robust regression; Trimming;

    Generating a Close-to-Reality Synthetic Population of Ghana

    Get PDF
    The purpose of this research is to generate a close-to-reality synthetic human population for use in a geosimulation of urban dynamics. Two commonly accepted approaches to generating synthetic human populations are Iterative Proportional Fitting (IPF) and Resampling with Replacement. While these methods are effective at reproducing one instance of the probability model describing the survey, it is an instance with extremely small variability amongst subgroups and is very unlikely to be the real population. IPF and Resampling with Replacement also rely on pure replication of units from the underlying sample which can increase unrealistic model behavior. In this work we present a sequential logic for estimating variables using multinomial logistic regressions and the conditional probabilities amongst each variable in order to generate combinations which were not represented in the original survey but are likely to occur in the real population. We also present a model based approach to imputing missing observation responses and apply the methodology to the Ghana Living Standard Survey 5 (GLSS5) in order to generate a comprehensive synthetic population for the Republic of Ghana, including such household and person variables as household size, tribal affiliation, educational attainment and annual income, amongst others. The R language and environment for statistical computing was used as well as the packages VIM and simPopulation in developing and executing the code. Contingency coefficients, cumulative distributions, mosaic plots, and box plots are presented for evaluation in order to demonstrate the effectiveness of the new method in its application to Ghana

    Economic analysis of site-specific wheat management with respect to grain quality and separation of the different quality fractions

    Get PDF
    The paper analyzes site-specific and uniform management options for wheat production with respect to grain quality. Besides site-specific fertilization the economic potential of segregation of different grain qualities is the subject of this paper. Yield and quality response to fertilizer were taken from field experiments in Germany to calculate site-specific response functions. The economic optima were calculated for uniform management (UM), complete separate management of the subfields (SM), site-specific fertilization (SSF) and grain segregation (GS) for different price structures according to different grain qualities. The results show that over all price structures, highest economic potential was found with SM or SSF compared to UM. However, these management practices require the possibility to separately manage subfields (SM) or specific fertilization equipment and fertilizer algorithms (SSM). GS did not have a higher economic potential than UM. However, if required grain qualities are not met for the whole field, GS can substantially reduce profit losses by separating part of the grains and selling them at higher prices. This may save the farmer more than 50 € ha–1. In situations where higher grain qualities could only be obtained at the expense of yield penalties, premiums for higher grain qualities can create incentives for fertilizer rates beyond the yield maximizing rate. GS technologies may even boost this effect.site-specific nitrogen management, wheat quality, grain segregation., Crop Production/Industries,

    Cost Efficient Tillage and Rotation Options for Mitigating GHG Emissions from Agriculture in Eastern Canada

    Get PDF
    The economic efficiency of cropping options to mitigate GHG emissions with agriculture in Eastern Canada was analyzed. Data on yield response to tillage (moldboard plow and chisel plow) and six corn based rotations were obtained from a 20-year field experiment in Ontario. Budgets were constructed for each cropping system while GHG emissions were measured for soil carbon and were estimated for nitrous oxide according to IPCC methodology. Complex crop rotations with legumes, such as corn-corn-soybeans-wheat with red clover underseeded, have higher net returns and substantially (more than 1 Mg ha1 year1) lower GHG emissions than continuous corn. Reduced tillage reduces GHG emissions due to lower input use but no sequestration effect could be found in the soil from tillage. Rotation had a much bigger effect on the mitigation potential of GHG emissions than tillage. However, opportunity costs of more than $200 per Mg CO2 eq ha1 year1 indicate the limits to increase the mitigation potential beyond the level of the economic best cropping system.Environmental Economics and Policy,

    Effectiveness of Best Management Cropping Systems to Abate Greenhouse Gas Emissions

    Get PDF
    Best management practices (BMPs) for cropping systems that involve conservation tillage and nutrient management are proposed as potential win-win solutions for both farmers and the environment. While originally targeted as a means for improving soil and water quality, these BMPs may also contribute to the mitigation of greenhouse gases (GHGs). Mitigation efforts have focused primarily on the ability of BMPs to sequester carbon and the subsequent potential revenue source carbon sequestration may represent to farmers. Increasingly, evidence from experimental stations calls into question the potential for C-sequestration with reduced tillage in soils in Eastern Canada. However, there are other ways in which BMPs can reduce GHG emissions: lowering fuel and nitrogen fertilizer consumption and, potentially, lowering emissions of nitrous oxide from the soil. This article examines the profitability and emission reduction potential of best management cropping practices for Ontario.Agricultural and Food Policy, Farm Management,

    Simulation a Close-to-Reality Synthetic Population of the Greater Accra Region

    Get PDF
    The purpose of this research is to simulate a synthetic population of the Greater Accra Metropolitan Region (GAMA) from the 2005 Ghana Living Standards Survey (GLSS5) for use in the Greater Accra Urban Simulation System (GAUSS). A primary goal in simulating the synthetic population of GAMA is to employ a method which generates close-to-reality population data rather than repeatedly drawing samples. In order to generate close-to-reality synthetic data, combinations which were not represented in the original household survey but are likely to occur in the true population must occur in the synthetically generated data. The author estimates the conditional distributions with multinomial logistic regression models in order to simulate categorical and continuous variables. The simulation of random zeros as opposed to structural zeros, are also reflected in the synthetically generated Greater Accra population. One of the main purposes for avoiding pure replication of units from the underlying sample is because this generally leads to small variability of units within smaller subgroups, which results in an increase in unrealistic model behavior when population data is used as input for agent-based simulations of urban dynamics
    • …
    corecore